Aquincola agrisoli sp. nov., isolated from rhizospheric soil of eggplant and in silico genome mining for the prediction of biosynthetic gene clusters

Abstract A Gram-stain negative, aerobic, rod-shaped, motile and flagellated novel bacterial strain, designated MAHUQ-54T, was isolated from the rhizospheric soil of eggplant. The colonies were observed to be light pink coloured, smooth, spherical and 0.2–0.6 mm in diameter when grown on R2A agar medium for 2 days. MAHUQ-54T was able to grow at 15–40 °C, at pH 5.5–9.0 and in the presence of 0–0.5 % NaCl (w/v). The strain gave positive results for both catalase and oxidase tests. The strain was positive for hydrolysis of l-tyrosine, urea, Tween 20 and Tween 80. On the basis of the results of 16S rRNA gene sequence comparisons, the isolate was identified as a member of the genus Aquincola and is closely related to Aquincola tertiaricarbonis L10T (98.8 % sequence similarity) and Leptothrix mobilis Feox-1T (98.2 %). MAHUQ-54T has a draft genome size of 5 994 516 bp (60 contigs), annotated with 5348 protein-coding genes, 45 tRNA and 5 rRNA genes. The average nucleotide identity (ANI) and digital DNA–DNA hybridisation (dDDH) values between MAHUQ-54T and its closest phylogenetic neighbours were 75.8–83.3 and 20.8–25.3 %, respectively. In silico genome mining revealed that MAHUQ-54T has a significant potential for the production of novel natural products in the future. The genomic DNA G+C content was determined to be 70.4 %. The predominant isoprenoid quinone was ubiquinone-8. The major fatty acids were identified as C16  :  0, summed feature 3 (comprising C16  :  1ω7c and/or C16  :  1ω6c) and summed feature 8 (comprising C18  :  1ω7c and/or C18  :  1ω6c). On the basis of dDDH, ANI value, genotypic analysis, chemotaxonomic and physiological data, strain MAHUQ-54T represents a novel species within the genus Aquincola, for which the name Aquincola agrisoli sp. nov. is proposed, with MAHUQ-54T (=KACC 22001T = CGMCC 1.18515T) as the type strain.


INTRODUCTION
The genus Aquincola, first described by Lechner et al. [1], is a member of the Rubrivivax-Roseateles-Leptothrix-Ideonella-Aquabacterium branch of the class Betaproteobacteria [2].The genus Aquincola presently comprises three species with validly published names: Aquincola tertiaricarbonis, isolated from methyl tert-butyl ether (MTBE)-contaminated groundwater in Germany [1] and a wastewater plant in France [3], Aquincola amnicola, isolated from a freshwater river in Taiwan [4] and Aquincola rivuli, isolated from a freshwater stream in Taiwan [5].Cells are Gram-stain-negative, obligately aerobic, non-sporeforming, rod-shaped, motile by means of a single polar flagellum and catalase-and oxidase-positive.Chemotaxonomically, cells possess Q-8 as the major respiratory quinone, summed feature 3 (comprising C 16 : 1 ω7c and/or C 16 : 1 ω6c) and C 16 : 0 as the predominant fatty acids and DNA G+C contents between 69.0 and 70.7 mol% [1][2][3][4][5].In the present study, we report a Gram-stain-negative bacterium, MAHUQ-54 T , which was isolated during the characterisation of the bacterial diversity in the rhizospheric soil of eggplant.Phylogenetic analyses based on 16S rRNA gene and genome sequences and polyphasic characterisation revealed that this isolate represented a member of the genus Aquincola and a novel species.The purpose of this study is to clarify the taxonomic position of MAHUQ-54 T in detail on the basis of phenotypic characteristics and the results of chemotaxonomic and genotypic analysis.Moreover, the genome sequence of MAHUQ-54 T was analysed for the presence of putative natural product biosynthetic gene clusters (BGCs).The availability of whole-genome sequences and synthetic biology-inspired tools/approaches make it possible to utilise these BGCs to develop new chemicals with new structures, new activity and new targets [6].Our data revealed that MAHUQ-54 T contains BGCs, indicating the potential capability to produce new chemicals with biological activity.

ISOLATION AND CULTIVATION
During the investigations of bacterial biodiversity, a novel bacterium, designated MAHUQ-54 T , was isolated from a sample of rhizospheric soil of an eggplant located in Magura, Bangladesh.A quantity (1 g) of soil sample was suspended in 9 ml of sterile 0.85 % (w/v) NaCl solution.The suspension was serially diluted up to a 10 −6 dilution and 200 µl suspension was spread onto Reasoner's 2A (R2A) agar plates (MB cell).The plates were incubated at 30 °C for 3 days.Single colonies were purified by repeated streaking on fresh R2A agar plates and preserved as a suspension in R2A broth containing glycerol (25 %, v/v) at −80 °C.On the basis of the results of 16S rRNA gene sequence analysis, MAHUQ-54 T was shown to be a novel bacterium and was selected for detailed taxonomic studies.MAHUQ-54 T has been deposited to the Korean Agricultural Culture Collection (KACC) and China General Microbiological Culture Collection Centre (CGMCC).

16S rRNA GENE, GENOME AND PHYLOGENETIC ANALYSIS
Extraction of the genomic DNA was achieved using a commercial genomic DNA extraction kit (Solgent).The 16S rRNA gene was amplified from the chromosomal DNA with the universal bacterial primer pair 27F (5′-AGAGTTTGATCCTGGCTCAG-3′) and 1492R (5′-GGTTACCTTGTTACGACTT-3′) [7] and the purified PCR products were sequenced by Solgent (Daejeon, Republic of Korea).The 16S rRNA gene sequences of related taxa were obtained from the GenBank database (http://blast.ncbi.nlm.nih.gov/Blast.cgi)and EzBioCloud server (https://www.ezbiocloud.net)[8].The multiple sequence alignments were performed by using the clustal_x programme [9].Gaps were edited using the BioEdit programme [10].The evolutionary distances were calculated using the Kimura two-parameter model [11].The phylogenetic trees were reconstructed based on 16S rRNA gene sequences using the neighbor-joining (NJ) [12], maximum-likelihood (ML) and maximum-parsimony (MP) algorithms in the mega 7.0 programme [13], with bootstrap values based on 1000 replications.The phylogenetic trees were also reconstructed using whole-genome sequences based on multi-locus sequence analysis (MLSA; https://automlst.ziemertlab.com/analyze)[14].The draft genome sequence of MAHUQ-54 T was determined using an HiSeq X Ten (Illumina) and was assembled using the SOAPdenovo v. 3.10.1 de novo assembler.The genome annotation was performed using the NCBI prokaryotic genome annotation pipeline (PGAP).To estimate the degree of pairwise relatedness between MAHUQ-54 T and the closest reference strains, blastbased average nucleotide identity (ANI) was calculated as described previously [15].While the digital DNA-DNA hybridisation (dDDH) value was determined using the genome-to-genome distance calculator (http://ggdc.dsmz.de/ggdc.php)according to the methods of Meier-Kolthoff et al. [16].
According to the results of EzBioCloud server analysis, 16S rRNA gene sequences indicated that the closest relations of strain MAHUQ-54 T were Aquincola tertiaricarbonis L10 T (98.8 %) and Leptothrix mobilis Feox-1 T (98.2 %).Similarities with all other strains were less than 97.6 %.The 16S rRNA gene sequence of MAHUQ-54 T is a continuous stretch of 1448 bp (NCBI GenBank accession number MT514502).The relationship between MAHUQ-54 T and other species was supported by the topology of the phylogenetic trees (Fig. 1;Figs S1 and S2, available in the online version of this article).The ML tree indicated that MAHUQ-54 T clustered within the genus Aquincola and formed a monophyletic clade with Aquincola tertiaricarbonis L10 T (Fig. 1).The ML tree was also supported by the trees created using the NJ and MP algorithms (Figs S1 and S2) with high bootstrap values.Moreover, the phylogenetic tree that was reconstructed from MLSA of whole-genome sequences indicated that MAHUQ-54 T is clustered with the members of genus Aquincola and formed a monophyletic clade with Aquincola tertiaricarbonis L10 T (Fig. S3).The results of phylogenetic analysis indicated that MAHUQ-54 T is clearly grouped within the genus Aquincola.The draft genome sequence of MAHUQ-54 T yielded a genome of 6.0 Mb in length after assembly, producing 60 contigs with an N 50 value of 359 003.The total genome size is 5 994 516 bp.Gene prediction allowed the annotation of 5348 protein-coding genes with 45 tRNA and 5 rRNA genes.The genomic DNA G+C content of MAHUQ-54 T , directly calculated from its genome sequence, was determined to be 70.4 % which is in the range of the type strains of species of the genus Aquincola [1][2][3][4][5].The genome sequence features of MAHUQ-54 T are listed in Table S1.The ANI values between MAHUQ-54 T and phylogenetically close neighbours Aquincola tertiaricarbonis L10 T and Leptothrix mobilis Feox-1 T were 83.35 and 75.89 %, respectively (Table S2).The dDDH values between MAHUQ-54 T and Aquincola tertiaricarbonis L10 T and Leptothrix mobilis Feox-1 T were 25.3 and 20.8 %, respectively (Table S2).These ANI values and dDDH values are well below the species thresholds of 95-96 and 70 %, respectively, indicating that MAHUQ-54 T represents a novel species [17][18][19].On the basis of dDDH results, ANI values and the results of phylogenetic analysis, it is evident that the isolated strain represents a novel species of the genus Aquincola.

COMPARATIVE GENOMIC STUDIES
For a whole-genome-based taxonomic analysis, the genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform accessible at https://tygs.dsmz.de(accessed 15 February 2024).The Genome blast Distance Phylogeny (GBDP) approach was also used to calculate dDDH values and to reconstruct phylogenetic trees using TYGS [20,21].GBDP phylogenetic trees were reconstructed using both 16S rRNA sequences and whole-genome sequences.CGView (http://cgview.ca/)was used to generate a graphical representation of the blast result comparison of the available genomes with the genome of MAHUQ-54 T .Taxonomic and functional research on microorganisms has increasingly relied upon genome-based data and methods [22].The distribution of genes in the genome of MAHUQ-54 T and the most closely related reference strain Aquincola tertiaricarbonis L10 T was investigated using the Rapid Annotation using Subsystems Technology (RAST) server [23].
Using the GBDP method and tree builder service, the phylogenetic trees of MAHUQ-54 T , using its 16S rRNA gene sequence and whole genome sequence, were created.The GBDP phylogenetic tree reconstructed by using 16S rRNA indicated that MAHUQ-54 T clustered with the members of the genus Aquincola and formed a monophyletic clade with Aquincola tertiaricarbonis L10 T (Fig. S4).Similarly, the GBDP phylogenetic tree reconstructed by using the whole genome sequence indicated that MAHUQ-54 T clustered with the members of the genus Aquincola and formed a monophyletic clade with Aquincola tertiaricarbonis L10 T (Fig. S5).In all, the 16S rRNA-based GBDP phylogenetic tree, genome-based GBDP phylogenetic tree and whole genome alignment and the results of comparative genome analysis (pairwise comparisons of user genomes vs type strain genomes, Table S3) indicated that MAHUQ-54 T represented a novel species belonging to the genus Aquincola.Fig. 2 shows the circular chromosomes based on the genome sequence of MAHUQ-54 T generated using CGView server (http://cgview.ca/),which is a web-based tool for comparative genomics analysis on circular genomes [24].The RAST functional annotations of the draft genome of MAHUQ-54 T indicated that 201 of the genes were involved in protein metabolism, 337 genes were associated with the metabolism of amino acids and derivatives, 82 genes were involved in DNA metabolism, 280 genes were linked with carbohydrate metabolism and 187 genes were involved in the metabolism of vitamins, cofactors and pigments.Moreover, the genome of MAHUQ-54 T revealed 81 gene clusters for stress response and 116 genes for respiration (Table S4).The genome of MAHUQ-54 T has 36 genes for motility and chemotaxis (Table S4).The presence of genes for flagellar motility and the presence of flagella (Fig. S6) indicated that the phenotypic and genomic results are consistent with each other.RAST functional analysis revealed that the genome of the most closely related type strain Aquincola tertiaricarbonis L10 T contains the same genes but there were quantitative differences (Table S4).For example, the genome of MAHUQ-54 T contains 39 genes which are responsible for virulence, disease and defence but the genome of Aquincola tertiaricarbonis L10 T contains 36 genes in this category.Similarly, the genome of MAHUQ-54 T contains 36 genes that are responsible for regulation and cell signalling but type strain Aquincola tertiaricarbonis L10 T contains 33 genes in this category (Table S4).

SECONDARY METABOLITE BIOSYNTHETIC GENE CLUSTER PREDICTION
As a main approach for finding and annotating genes in BGCs across the genome, antiSMASH 7 [25] combined with ClusterBlast, ActiveSiteFinder, Cluster PFam analysis and SubClusterBlast [25] was used for the discovery of BGCs in the genome of Aquincola agrisoli MAHUQ-54 T for secondary metabolites.Using antiSMASH 7.0, we found several BGCs in the genome of the novel strain Aquincola agrisoli MAHUQ-54 T for different secondary metabolites.Through prediction using antiSMASH 7.0, eight BGCs were discovered in the genome of Aquincola agrisoli MAHUQ-54 T (Table 1 and Fig. 3).The BGC types include those for nonribosomal peptide-synthetase (NRPS), redox-cofactor, acyl amino acids, non-ribosomal peptide (NRP)-metallophore, NRPS, type I polyketide synthases (T1PKS), NRPS-like, ribosomally synthesised and post-translationally modified peptide (RiPP)-like, RiPP recognition element (RRE)-containing, terpene, NRPS, T1PKS and acyl amino acids were discovered in the genome of MAHUQ-54 T (Table 1 and Fig. 3).All of these BGCs exhibited just a low degree of similarity or resemblance to previously identified BGCs, implying that MAHUQ-54 T has a significant potential for the production of novel natural products in the future.
In summary, as indicated by the phylogenetic trees, MAHUQ-54 T represents a member of the genus Aquincola.In addition, the characteristics of MAHUQ-54 T are consistent with descriptions of the members of the genus Aquincola with regard to  morphological, biochemical and chemotaxonomic properties.However, MAHUQ-54 T can be distinguished from the most closely related type strain not only by physiological and biochemical characteristics but also by low dDDH values and ANI values.The results of this polyphasic comparison between MAHUQ-54 T and its close phylogenetic neighbours indicated thatMAHUQ-54 T should be assigned to the genus Aquincola as the type strain of a novel species, for which the name Aquincola agrisoli sp.nov. is proposed.
The type strain, MAHUQ-54 T (=KACC 22001 T = CGMCC 1.18515 T ), was isolated from the rhizospheric soil of eggplant located in Magura, Bangladesh.The genomic DNA G+C content of the type strain is 70.4 %.The NCBI GenBank accession number for the 16S rRNA gene and draft genome sequences of MAHUQ-54 T are MT514502 and JAZIBG000000000, respectively.

Funding information
No external funding was received for this study.

Fig. 1 .
Fig. 1.The maximum-likelihood (ML) tree based on the results of 16S rRNA gene sequence analysis showing the position of Aquincola agrisoli MAHUQ-54 T and related species.Bootstrap values less than 50 % based on 1000 replications are not shown at branching points.'Paenibacillus ginsengiterrae' DCY89 was used as an outgroup.Bar, 0.05 substitutions per nucleotide position.

Fig. 2 .
Fig. 2. Schematic representation of the circular chromosome of MAHUQ-54 T , created using the CGView server (http://cgview.ca/).The outermost circle displays the contigs while the middle circle displays the DNA G+C content plot and the innermost circle displays the G+C skew.To indicate genome sizes inside and outside, the ruler was used in the chromosome map.